Chapter 4 

Nonlinear equations 



4.1 Root finding 

Consider the problem of solving any nonlinear relation g(x) = h(x) in the 
real variable x. We rephrase this problem as one of finding the zero (root) 
of a function, here f(x) = g(x) — h(x). The minimal assumption we need on 
/, g, h is that they're continuous. 

We have at our disposal any number of evaluations of / and its derivative 

/'■ 

1. Method 1: bisection. The bisection methods starts from two points 
ao and b such that 

f(a ) > 0, and f(b ) < 0. 

Because / is continous, there must exist a root in the interval [ao,&o]- 
At stage k, assume that we have obtained an interval [a k , bk] such that 
the same sign properties hold: f(a k ) > and /(&&) < 0. The bisection 
method consists in subdividing the interval [ak,bk] in two and discard 
the half in which there may not be a root. Let = (a^ + bk)/2. 

• If f(rrik) < 0, then it is the interval [a^m^] which is of interest. 
We put a k+ i = a k and b k+1 = m k . 

• If f{m k ) > 0, then it is the interval [m k ,b k ] which is of interest. 
We put a k+ i = m k and b k+1 = b k . 

• If f(m k ) = 0, then m k is a root and we stop the algorithm. 
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In practice, this iteration is stopped once f{mk) gets small enough. Let 
x* be the unknown root. The error obeys 



\x 



m k \ < \b k -a k \=2 k \b - a \. 



Every step of the bisection discovers a new correct digit in the binary 
expansion of x*. 

The advantage of the bisection method is that it is guaranteed to con- 
verge to a root, by construction. On the other hand, convergence is 
rather show compared to the next 2 methods we now present. If there 
are several roots, the bisection method will converge toward one of them 
(we may not have no control over which root the method chooses.) 

2. Method 2: Newton-Raphson. This method is very important: it is 
the basis of most optimization solvers in science and engineering. Let 
us first present the Newton-Raphson method for solving a single scalar 
equation f(x) = 0. 

Newton's method fits a tangent line to the point (x n ,f(x n )) on the 
graph of /, and defines x n+1 at the intersection of this tangent line 
with the x axis. We have 

= /(l„) + (X n+1 - X n )f'(x n ), 

from which we isolate 



For instance, we can find the decimal expansion of y/2 by finding the 
positive root of f(x) = x 2 — 2. The iteration reads 



_ X n ~ 2 _ ^ _ JL 

ZXr>. L X n 



J n 

Starting with x = 1, we get x\ — | = 1.5, x 2 = f| = 1.4167..., 
x 3 = m = 1.4142157... The true value of V2 is 1.4142135... 

Convergence is very fast, when it occurs. Assume that /" is continuous, 
and that f'(x) 7^ in some neighborhood of the root x* (large enough 
so that all our iterates stay in this neighborhood.) Put e n = x n — x*. 
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Then we can perform a Taylor expansion of / around x n , and evaluate 
it at x = x*: 

= f{x*) = f(x n ) + (x* - x n )f{x n ) + \{x*- x n ) 2 f"(0, 

for some £ G int(x n , x*) (the notation int refers to the interval generated 
by x n and x*, i.e., either [x n , x*} or [x*, x n ].) We also have the equation 
defining x n+ \. 

= f(x n ) + (x n+1 - x n )f(x n ). 
Subtracting those 2 equations, we get 

= -e n+1 f'(x n ) + \e 2 J"{t), 

i no 2 

tn+1 2/'(x n ) e "' 

Our assumptions ensure that the ratio jT^ry exists and converges to 
some limit (f"(x*)/ f'(x*)) as n — > oo. Hence the sequence is bounded 
uniformly in n, and we can write 

I e n+l| < C^n' 

where C > is some number (which depends on / but not on n.) It 
follows that 

We say the method "converges quadratically" because the exponent of 
e n is 2. The number of correct digits is squared at each iteration. In 
contrast, the bisection method only converges linearly. We also some- 
times refer to "linear convergence" as first-order convergence, although 
the meaning of the expression is completely different from what is was 
in the previous chapters. 

Convergence is ensured as soon as the starting point xq is close enough 
to the (unknown) root x*, in the sense that |Ceo| < 1, so that (Ce ) 2 — > 
as k — > oo. If the condition |Ce | < 1 is not satisfied, Newton's 
method may very well diverge. For instance, we expect problems when 
the derivative is very small: following the tangent can take us to a 
region very far away from the root. An example of a function f(x) for 
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which Newton's method diverges is atan(x), when xo is chosen to be 
too far from the origin. 

On the plus side, Newton's method is fast. On the minus side, Newton's 
method only converges to a root only when you're already quite close 
to it. 

3. Method 3: the secant method. 

If we do not know the derivative, we cannot set up Newton's method, 
but we can approximate it by replacing the derivative by (let f n = 

f(Xn)) 

rr ] fn fn—1 

X n X n _\ 

Hence we define x n+ \ by 

_ fn 



f\%n— li 

The geometrical idea is to replace the tangent line at x n by the secant 
line supported by x n _i and x n . The secant method requires two points 
Xq and xi as starting guesses. 

Notice that at each step, only one evaluation of / is necessary, because 
f(x n -i) is already known from the previous iteration. If we were to form 
a finite difference approximation of the derivative with a very small grid 
step h, we may be more accurate but that requires two evaluations of 
/ rather than one. 

Let us check the convergence properties of the secant method. The 
line joining the two points (x n -ii /n-i) an d (x n , f n ) is the degree-1 
interpolant in the interval [x n -i,x n ]: 

P(x) = f n + /[x n _i,a; n ](a: - x n ). 

Outside of this interval, it is an extrapolant. Regardless of whether 
x G [x n _i,x„] or not, the difference between p and / is known from a 
theorem we saw in the previous chapter: 

f(x) - p{x) = -f"{£){x - x n )(x - X n _i), 



4 



4.1. ROOT FINDING 



where £ is in the smallest interval containing x,x n -i, and x n . Evaluat- 
ing this relation at the root x = x*, we get 

= /„ + f[x n -i,X n ](x* - X n ) + ^f"(g)(x* - X n )(x* - Xn-x). 

On the other hand the definition of x n +i gives 

fn~^~f \%n— 1 j -^n] (^n+1 • 

Subtracting these two equations we get 

_ i no 

Again, thanks to the same assumptions on / as in Newton's method, 

fit />\ 

the ratio f , \ has a finite limit as n — > oo, hence is bounded by 
some number C > 0. We get 

| e n+l| ^ C| e n|| e n-l|- 

The decay of e n is somewhere between first (linear) and second (quadratic) 
order. To obtain a more precise rate of decay, we guess that the in- 
equality above should be reducible to the form |e n | < C|e n _i| p for some 
p. Using this equation and |e n+ i| < C|e n | p above, we get 

| e n-l| P < C|en-l| P | e n-l|- 

The exponents match on the left and the right provided p 2 — p + 1, 
which has for positive solution 

1 \/^ 

p = — - — . (a number sometimes called the golden ratio). 

We check that p — 1.618..., a number between 1 and 2, Hence the secant 
method is faster than bisection, but slower than Newton's method. 
The secant method inherits the problem of Newton's method: it only 
converges when the starting guesses x and x\ are sufficiently close to 
the root. 
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We can also set up Newton's method in several dimensions. A system of 
nonlinear equations is of the form 

fi(x u ...,x n ) = 0, i = l,...,n. 

We take the same number of equations and unknowns, so that we may be in 
a situation where there is one solution (rather than a continuum of solutions 
or no solution at all.) Whether the system has zero, one or several solutions 
is still a question that needs to be addressed separately. The shorthand 
notation for the system is f(x) = 

By analogy with the ID case we perform a Taylor expansion about x n : 

= f(x*) = f(x n ) + Vf(x n )(x* - x n ) + 0(||x* - x n || 2 ). 

With indices this equation is written as 

= /,(X*) = /,(X„) + J2 f(x«)fen - **) + 0£>i,n - X*f). 

i=i 3 j 

(Watch the subscript n which indicates the n-th iterate while the subscript j 
indicates the j-th component.) The next iterate x n+1 is defined by neglecting 
the quadratic error and isolating x*. A linear system of equations needs to 
be solved: the Jacobian matrix Vf(x n ) is inverted and we get 

x n+ i = x n - [Vf(x n )] -1 f(x n ). 

The geometrical interpretation of this equation is that we can fit the tangent 
plane to each of the surfaces y = fi(xi, . . . , x n ) in find the line at 

the intersection of all these planes, and check where this line intersects the 
(hyper)plane y — 0. 

Newton's method is still quadratically convergent in multiple dimensions, 
and special care must still be taken to make sure that we start close enough 
to a root. 

Example 10. 

x\ + x\ = 1, X2 = sin(xi). 
Write this as a root-finding problem: fi = f 2 = with 

fi(x 1 ,x 2 ) = x\ + x\ - 1, h{x\, x 2 ) = x 2 - sm(xi). 
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The Jacobian matrix is 

J = Vf (x) = 



(dh 




I dx 1 


S ) = ( 


I dh 




\ dxi 


dx 2 I V 



2xi 2x 2 
cos(xi) 1 



Use the formula for the inverse of a 2-by-2 matrix: 

'a b\ 1 1 f d —b 



to obtain 



c d ad — be \—c a I ' 



-2a> 



2rri + 2x 2 cos(a;i) \cos(xi) 2zi /' 
T/ie Newton iteration is therefore 



^2,n+l/ \ X 2,nJ \ %2,n ~ Sill Xi 5 „ 

4.2 Optimization problems 

Another important recurring problem in science and engineering is that of 
finding a minimum or a maximum of a function Fix). A point re* is a local 
minimum when F(y) > F(x*) for all y in a neighborhood of x*. It is a global 
minimum when F(y) > F(x*) for all y. We write 

minF(x) 

X 

for the minimum value F(x*). We then call x* the argument of the minimum. 
Maximizing F(x) instead is the same as minimizing —F(x), so it suffices to 
talk about minimization. 

When F(x) is smooth, and x is allowed to run over all real numbers (not 
restricted to an interval or other set), then it suffices to solve F'(x) = (and 
check that F"(x) > 0) in order to find a local minimum. Hence it suffices 
to apply Newton's method or any other root-finding method to the function 
f(x) = F'(x). We obtain 

_ F'(x n ) 



F"{x n ) 
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In multiple dimensions, we minimize a scalar function F(xi, . . . , x n ). The 
optimality condition, obeyed at the minimum x\, . . . x* , is that all the partial 
derivatives of F vanish, i.e., 

VF(xl,...<) = 0. 

Newton's method, also called Newton descent, follows from considering these 
equations as a nonlinear system fi(x±, . . . ,x n ) = with fa = We get 

x n+1 = x n - [VVF(x n )]~ 1 VF(x n ). 

The matrix VVF of second partial derivatives of F is called the Hessian. In 
index notation, 

d 2 F 

(VVF)jj g x .g x . ■ 

Compare Newton's method with simple gradient descent: 

x n+ i = x n - «VF(x Ii ), 

for some sufficiently small scalar a. Gradient descent is slower but typically 
converges from a larger set of initial guesses than Newton's method. 

Example 11. Consider 

F(x 1 ,x 2 ) = x\ + {\ogx 2 f. 
This function has a unique minimum for x± e R and x 2 > 0. We compute 



\/F(x 1 ,X 2 ) = ^2]og_X2 J 



and 

(l 

VVF(Xi, X 2 ) — I q 2-2 log x 2 

Newton's iteration is therefore 
xi,n+i\ _ I Xl,n 

X 2 ,n+1 J \%2,n 

Notice that x\ goes in one step to zero, because a quadratic function is exactly 
minimized by Newton's method. 




8 



MIT OpenCourseWare 
http://ocw.mit.edu 



18.330 Introduction to Numerical Analysis 

Spring 2012 



For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms . 



